Deriving a Bilingual Lexicon for Cross Language Information Retrieval
نویسنده
چکیده
In this paper we describe a systematic approach to derive a bilingual lexicon automatically from paral lel corpora Following this approach a lexicon was derived from the English and Dutch version of the Agenda corpus With the lexicon and a part of the corpus that was not used to derive the lexicon a bilingual retrieval environment was build Recall and precision of monolingual Dutch retrieval was compared to recall and precision of bilingual Dutch to English retrieval An experiment was conducted with the help of eight naive users who formulated queries and judged the relevance of retrieved frag ments The experiment shows precision and relative recall of monolingual retrieval against precision and relative recall of bilingual re trieval
منابع مشابه
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval
OBJECTIVES We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. MATERIAL AND METHODS We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and seco...
متن کاملBilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval
The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval. A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their...
متن کاملBengali and Hindi to English Cross-language Text Retrieval under Limited Resources
This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a HindiEnglish bilingual lexicon, ’Shabdanjali’, consisting of approx. 26K H...
متن کاملExtraction of Cross Language Term Correspondences
This paper describes a method for extracting translations of terms across languages, using parallel corpora. The extracted term correspondences are such that they are useful when performing query expansion for cross language information retrieval, or for bilingual lexicon extraction. The method makes use of the mutual information measure and allows for mapping between single wordto multi-word t...
متن کاملAutomatic Term Extraction for Cross-Language Information Retrieval Using a Bilingual Parallel Corpus
Information retrieval is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval refers to a kind of information retriev/al in which the language of the query and that of searched document are different. This paper tries to construct a bilingual lexicon from an English...
متن کامل